Skip to content

[compiler-rt] Fix frame numbering for unparsable frames. #148278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jschwartzentruber
Copy link

This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace:

    #0 0x564ac90fb80f  (/builds/worker/dist/bin/js+0x240e80f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #1 0x564ac9223a64  (/builds/worker/dist/bin/js+0x2536a64) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #2 0x564ac922316f  (/builds/worker/dist/bin/js+0x253616f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #3 0x564ac9eac032  (/builds/worker/dist/bin/js+0x31bf032) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #4 0x0dec477ca22e  (<unknown module>)

Without this change, the following symbolization is output:

    #0 0x55a6d72f980f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #1 0x55a6d72f980f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #2 0x55a6d7421a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #3 0x55a6d742116f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #4 0x55a6d80aa032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #4 0x2c803bd8f22e  (<unknown module>)

The last frame has a duplicate number. With this change the numbering is correct:

    #0 0x5620c58ec80f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #1 0x5620c58ec80f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #2 0x5620c5a14a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #3 0x5620c5a1416f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #4 0x5620c669d032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #5 0x349f24c7022e  (<unknown module>)

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Jesse Schwartzentruber (jschwartzentruber)

Changes

This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace:

    #<!-- -->0 0x564ac90fb80f  (/builds/worker/dist/bin/js+0x240e80f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #<!-- -->1 0x564ac9223a64  (/builds/worker/dist/bin/js+0x2536a64) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #<!-- -->2 0x564ac922316f  (/builds/worker/dist/bin/js+0x253616f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #<!-- -->3 0x564ac9eac032  (/builds/worker/dist/bin/js+0x31bf032) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58)
    #<!-- -->4 0x0dec477ca22e  (&lt;unknown module&gt;)

Without this change, the following symbolization is output:

    #<!-- -->0 0x55a6d72f980f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #<!-- -->1 0x55a6d72f980f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #<!-- -->2 0x55a6d7421a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&amp;) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #<!-- -->3 0x55a6d742116f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&amp;, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #<!-- -->4 0x55a6d80aa032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle&lt;JS::Value&gt;) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #<!-- -->4 0x2c803bd8f22e  (&lt;unknown module&gt;)

The last frame has a duplicate number. With this change the numbering is correct:

    #<!-- -->0 0x5620c58ec80f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3
    #<!-- -->1 0x5620c58ec80f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5
    #<!-- -->2 0x5620c5a14a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&amp;) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13
    #<!-- -->3 0x5620c5a1416f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&amp;, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12
    #<!-- -->4 0x5620c669d032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle&lt;JS::Value&gt;) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10
    #<!-- -->5 0x349f24c7022e  (&lt;unknown module&gt;)

Full diff: https://github.com/llvm/llvm-project/pull/148278.diff

1 Files Affected:

  • (modified) compiler-rt/lib/asan/scripts/asan_symbolize.py (+30-17)
diff --git a/compiler-rt/lib/asan/scripts/asan_symbolize.py b/compiler-rt/lib/asan/scripts/asan_symbolize.py
index 058a1614b55e6..e70f987f03fe6 100755
--- a/compiler-rt/lib/asan/scripts/asan_symbolize.py
+++ b/compiler-rt/lib/asan/scripts/asan_symbolize.py
@@ -22,7 +22,6 @@
 import argparse
 import bisect
 import errno
-import getopt
 import logging
 import os
 import re
@@ -38,6 +37,7 @@
 allow_system_symbolizer = True
 force_system_symbolizer = False
 
+
 # FIXME: merge the code that calls fix_filename().
 def fix_filename(file_name):
     if fix_filename_patterns:
@@ -507,20 +507,29 @@ def symbolize_address(self, addr, binary, offset, arch):
         assert result
         return result
 
-    def get_symbolized_lines(self, symbolized_lines, inc_frame_counter=True):
+    def get_symbolized_lines(self, symbolized_lines):
         if not symbolized_lines:
-            if inc_frame_counter:
-                self.frame_no += 1
-            return [self.current_line]
-        else:
-            assert inc_frame_counter
-            result = []
-            for symbolized_frame in symbolized_lines:
-                result.append(
-                    "    #%s %s" % (str(self.frame_no), symbolized_frame.rstrip())
+            # If it is an unparsable frame, but contains a frame counter and address
+            # replace the frame counter so the stack is still consistent.
+            unknown_stack_frame_format = r"^( *#([0-9]+) +)(0x[0-9a-f]+) +.*"
+            match = re.match(unknown_stack_frame_format, self.current_line)
+            if match:
+                rewritten_line = (
+                    self.current_line[: match.start(2)]
+                    + str(self.frame_no)
+                    + self.current_line[match.end(2) :]
                 )
                 self.frame_no += 1
-            return result
+                return [rewritten_line]
+            # Not a frame line so don't increment the frame counter.
+            return [self.current_line]
+        result = []
+        for symbolized_frame in symbolized_lines:
+            result.append(
+                "    #%s %s" % (str(self.frame_no), symbolized_frame.rstrip())
+            )
+            self.frame_no += 1
+        return result
 
     def process_logfile(self):
         self.frame_no = 0
@@ -546,8 +555,7 @@ def process_line_posix(self, line):
         match = re.match(stack_trace_line_format, line)
         if not match:
             logging.debug('Line "{}" does not match regex'.format(line))
-            # Not a frame line so don't increment the frame counter.
-            return self.get_symbolized_lines(None, inc_frame_counter=False)
+            return self.get_symbolized_lines(None)
         logging.debug(line)
         _, frameno_str, addr, binary, offset = match.groups()
 
@@ -603,6 +611,7 @@ def _load_plugin_from_file_impl_py_gt_2(self, file_path, globals_space):
     def load_plugin_from_file(self, file_path):
         logging.info('Loading plugins from "{}"'.format(file_path))
         globals_space = dict(globals())
+
         # Provide function to register plugins
         def register_plugin(plugin):
             logging.info("Registering plugin %s", plugin.get_name())
@@ -779,9 +788,13 @@ def __str__(self):
             arch=self.arch,
             start_addr=self.start_addr,
             end_addr=self.end_addr,
-            module_path=self.module_path
-            if self.module_path == self.module_path_for_symbolization
-            else "{} ({})".format(self.module_path_for_symbolization, self.module_path),
+            module_path=(
+                self.module_path
+                if self.module_path == self.module_path_for_symbolization
+                else "{} ({})".format(
+                    self.module_path_for_symbolization, self.module_path
+                )
+            ),
             uuid=self.uuid,
         )
 

jschwartzentruber added a commit to MozillaSecurity/ffpuppet that referenced this pull request Jul 15, 2025
This can happen when JIT code is run, and we can't symbolize those
frames, but they should remain numbered in the stack.

see: llvm/llvm-project#148278
tysmith pushed a commit to MozillaSecurity/ffpuppet that referenced this pull request Jul 15, 2025
This can happen when JIT code is run, and we can't symbolize those
frames, but they should remain numbered in the stack.

see: llvm/llvm-project#148278
Copy link
Contributor

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an OSS contributor, I don't have commit access, but wanted to leave some suggestions in case they help. Hope they're helpful!

Comment on lines 791 to 797
module_path=(
self.module_path
if self.module_path == self.module_path_for_symbolization
else "{} ({})".format(
self.module_path_for_symbolization, self.module_path
)
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking - there's no change here, just formatting, right?
Was this intended or applied by some kind of auto-formatter? Just wondering if we should minimize the diff by reverting this.

As an aside - this assignment to module_path is moderately complex. I wonder if we could make it simpler by expanding the ternary expression into a proper if-else block. But that's probably out of scope.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the result of running black on the file, since I'm touching the file. I can revert if minimal diff is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say authoritatively one way or another, but I'd imagine a minimal diff will make it easier to approve.

@@ -38,6 +37,7 @@
allow_system_symbolizer = True
force_system_symbolizer = False


# FIXME: merge the code that calls fix_filename().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this FIXME still accurate? Otherwise, I'd expect this change to have no effect as it is no called as per the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment. A newline was added by black but otherwise this FIXME is unrelated to the PR, and I didn't touch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't super clear. I realize that the PR did not add the FIXME :-) .
I was asking if you think the FIXME is no longer accurate because the FIXME implies that fix_filename is not being called (specifically, that the code that invokes it is not merged yet). However, if that were true, then your code would have no effect, so I think the FIXME is stale or wrong.

So I was wondering if you had context on it, in case we can remove the FIXME. That's all! Obviously, somewhat out of scope.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry. I don't know anything about the purpose of the function or the FIXME.

At the time the FIXME was added, the file already contained calls to fix_filename from the symbolize methods of a few Symbolizer subclasses. My guess is that the author hoped these callers could be merged into super().symbolize() instead of using a helper function. The implementations have gotten more complex since then, so I doubt the comment is still relevant.

@thurstond
Copy link
Contributor

Just an OSS contributor, I don't have commit access, but wanted to leave some suggestions in case they help. Hope they're helpful!

@davidmrdavid FYI since you've merged three pull requests (#125924, #122990, #117929) you're eligible for commit access if you wish: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
I'd be happy to +1 your commit access request based on your helpful comments :-)

@davidmrdavid
Copy link
Contributor

Thanks @thurstond, that would be wonderful! I'll look to follow those instructions later :) Thanks!

This can happen when JIT code is run, and we can't symbolize those
frames, but they should remain numbered in the stack.
@jschwartzentruber jschwartzentruber force-pushed the fix-asan-symbolize-unknown-frame branch from c3e8617 to 7aa8d35 Compare July 22, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants